NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Convergo: Multi-SLO-Aware Scheduling for Heterogeneous AI Accelerators on Edge Devices

https://doi.org/10.1109/EDGE67623.2025.00022

Jiang, Ting; Hao, Jianwei; Harsha, Sushruth; Rachmanto, Rakandhiya D; Setyanto, Arief; Ramaswamy, Lakshmish; Kim, In Kee (July 2025, IEEE)

With the growing prevalence of edge AI, systems are increasingly required to meet stringent and diverse service level objectives (SLOs), such as maintaining specific accuracy levels, ensuring sufficient inference throughput, and meeting deadlines, often simultaneously. However, concurrently achieving these varied and complex SLOs is particularly challenging due to the resource constraints of edge devices and the heterogeneity of AI accelerators. To address this gap, we present a novel AI scheduling framework, Convergo, which uniquely integrates heterogeneous accelerator management, multi-tenancy, and multi-SLO prioritization into one scheduling solution. Convergo not only leverages heterogeneous AI accelerators and supports AI multi-tenancy, but also integrates scheduling heuristics to meet multiple SLOs concurrently. Convergo enables the simultaneous satisfaction of multiple/complex SLO requirements (e.g., accuracy, throughput, and deadline constraints). The scheduling algorithm prioritizes inference requests, imposes critical constraints, and selects the best model combinations for current inferencing. We evaluated Convergo on the Jetson Xavier platform with portable TPU accelerators across various AI workloads, demonstrating its effectiveness. The evaluation results show that Convergo outper- forms state-of-the-art baselines, achieving over 90% satisfaction of all three distinct SLO requirements simultaneously while maintaining approximately 95% satisfaction for individual SLOs. Furthermore, Convergo achieves these results with negligible overhead, making it a promising solution for edge AI systems.
more » « less
Free, publicly-accessible full text available July 7, 2026
A Study of Java Microbenchmark Tail Latencies

https://doi.org/10.1145/3578245.3584690

He, Sen; Kim, In Kee; Wang, Wei (April 2023, Companion of the 2023 ACM/SPEC International Conference on Performance Engineering)

Full Text Available
CloudBruno: A Low-Overhead Online Workload Prediction Framework for Cloud Computing

https://doi.org/10.1109/IC2E55432.2022.00027

Jayakumar, Vinodh Kumaran; Arbat, Shivani; Kim, In Kee; Wang, Wei (September 2022, IEEE International Conference on Cloud Engineering (IC2E))

Full Text Available
Wasserstein Adversarial Transformer for Cloud Workload Prediction

https://doi.org/10.1609/aaai.v36i11.21509

Arbat, Shivani; Jayakumar, Vinodh Kumaran; Lee, Jaewoo; Wang, Wei; Kim, In Kee (June 2022, Proceedings of the AAAI Conference on Artificial Intelligence)

Predictive VM (Virtual Machine) auto-scaling is a promising technique to optimize cloud applications’ operating costs and performance. Understanding the job arrival rate is crucial for accurately predicting future changes in cloud workloads and proactively provisioning and de-provisioning VMs for hosting the applications. However, developing a model that accurately predicts cloud workload changes is extremely challenging due to the dynamic nature of cloud workloads. Long- Short-Term-Memory (LSTM) models have been developed for cloud workload prediction. Unfortunately, the state-of-the-art LSTM model leverages recurrences to predict, which naturally adds complexity and increases the inference overhead as input sequences grow longer. To develop a cloud workload prediction model with high accuracy and low inference overhead, this work presents a novel time-series forecasting model called WGAN-gp Transformer, inspired by the Transformer network and improved Wasserstein-GANs. The proposed method adopts a Transformer network as a generator and a multi-layer perceptron as a critic. The extensive evaluations with real-world workload traces show WGAN- gp Transformer achieves 5× faster inference time with up to 5.1% higher prediction accuracy against the state-of-the-art. We also apply WGAN-gp Transformer to auto-scaling mechanisms on Google cloud platforms, and the WGAN-gp Transformer-based auto-scaling mechanism outperforms the LSTM-based mechanism by significantly reducing VM over-provisioning and under-provisioning rates.
more » « less
Full Text Available
Privacy invasion via smart-home hub in personal area networks

https://doi.org/10.1016/j.pmcj.2022.101675

Setayeshfar, Omid; Subramani, Karthika; Yuan, Xingzi; Dey, Raunak; Hong, Dezhi; Kim, In Kee; Lee, Kyu Hyung (September 2022, Pervasive and Mobile Computing)

Full Text Available
Performance Testing for Cloud Computing with Dependent Data Bootstrapping

https://doi.org/10.1109/ASE51524.2021.9678687

He, Sen; Liu, Tianyi; Lama, Palden; Lee, Jaewoo; Kim, In Kee; Wang, Wei (November 2021, IEEE/ACM International Conference on Automated Software Engineering, 2021)

Full Text Available
ChatterHub: Privacy Invasion via Smart Home Hub

Setayeshfar, Omid; Subramani, Karthika; Yuan, Xingzi; Dey, Raunak; Hong, Dezhi; Lee, Kyu Hyung; Kim, In kee (January 2021, Proceedings of the 2021 IEEE Conference on Smart Computing (SmartComp))
null (Ed.)
Smart-home devices promise to make users’ lives more convenient. However, at the same time, such devices increase the possibility of breaching users’ privacy as they are tightly connected to the users’ daily lives and activities. To address privacy invasion through smart-home devices, we present ChatterHub. This novel approach accurately identifies smart-home devices’ activities with minimal monitoring of encrypted traffic in the home network. ChatterHub targets devices that can only connect to the Internet through a centralized smart-home hub (e.g., Samsung SmartThings) using Zigbee or Z-wave. Specifically, ChatterHub passively eavesdrops on encrypted network traffic from the hub and leverages machine learning techniques to classify events and states of smart-home devices. Using ChatterHub, an adversary can identify smart-home devices’ specific activities without prior knowledge of the target smart home (e.g., list of deployed devices, types of communication protocols). We evaluated the accuracy and efficiency of ChatterHub in three real-world smart-home environments, and the evaluation results show that an attacker can successfully disclose smart-home devices’ behaviors with over 88% F1 score. We further demonstrate that ChatterHub successfully recognizes privacy-sensitive activities, including open and close of a smart door lock and turn on and off of smart LED. Additionally, to mitigate the threats posed by ChatterHub, we introduce two approaches, packet padding and random sequence injection. These mitigation approaches can effectively prevent threats from ChatterHub with only 9.2MB of additional network traffic per day.
more » « less
Full Text Available

Search for: All records